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Abstract 

Using a metric related to the returns correlation, a method is pro- 
posed to reconstruct an economic space from the market data. A 
reduced subspace, associated to the systematic structure of the mar- 
ket, is identified and its dimension related to the number of terms in 
factor models. Example were worked out involving sets of companies 
from the DJIA and S&P500 indexes. 

Having a metric defined in the space of companies, network topol- 
ogy coefficients may be used to extract further information from the 
data. A notion of "continuous clustering" is defined and empirically 
related to the occurrence of market shocks. 
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1 Introduction 

In spite of the important achievements obtained in finance theory (see for 
example |http: / /welch.som.yale.edu/academics/toptenfinance.html| and ch.35 
in Ref.pl]) nobody claims that the fundamental laws of the economic process 
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are known. A set of fundamental laws under which all economic relations 
might be interpreted is certainly not known and, even if such laws were to 
exist, we do not know how to infer from the data what are the variables that 
play the relevant role in the equations. Instead, economic theory generally 
establishes, a priori, the models as sets of restrictions in order to proceed 
to statistical tests of the data. Most of the developments in finance theory 
follow this line. 

The dominant views, such as the Efficient Market Hypothesis, based on 
the work of SamuelsonP] and Fama|§, and the derived models, such as the 
multifactor capital asset pricing model (CAPM) and arbitrage pricing 
theory (APT) ||, assess the evolution of financial markets as the result of 
the rational action of informed agents faced with Brownian processes. These 
models provide conceptual insights on the issues of pricing and portfolio 
selection, although attempts to test them has been hindered by the inability 
to find a reliable set of factors to explain the securities return data. Chen 
et al. || have attempted to establish statistical correlations between some 
economic facts (like unanticipated changes in industrial production, interest 
rates or inflation) and asset returns, to identify the economic forces that 
are driving the market. But the very identification of such forces, and the 
rationale for its theoretical underpinnings, is also controversial. 

Mandelbrot, who studied the properties of stable distributions other than 
the Gaussian, applied new statistical methods to financial series, suggested 
the existence of low frequency dependence in the stock market data and chal- 
lenged the dominance of Brownian processes |?|]. Indeed, Mandelbrot inter- 
preted the fat tails in the distribution of changes of prices and the empirical 
evidence of sharp discontinuities in the evolution of these markets as evi- 
dence for the presence of a stable distribution. Instead, his critics argued 
that the financial series should be interpreted as a result of variables with 
typically high frequency variance, such as serial correlation and Markov de- 
pendence. Consequently the fat tails of the distribution of price changes 
could be explained by subordinate stochastic processes, in particular by time 
varying variances of Gaussian processes, rather than by stable distributions 
or truncated Levy processes. 

Mandelbrot's views were understood as a criticism of the conventional 
wisdom on the inexistence of structure in the evolution of stock markets, and 
were generally rejected. Also based on the market, we will address here this 
topic of debate from a different approach. Instead of establishing correlations 
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with predefined factors, our point of view is that it may be possible to extract 
from the data itself, if not the economic variables, at least their geometrical 
relations. And also that such an exploration might be fruitful for statistical 
analysis. The idea is simply stated in the following terms: 

(i) Pick a representative set of N stocks and their historical data of returns 
over some time period. 

(ii) From the returns data, using an appropriate metric, compute the 
matrix of distances between the N stocks. 

The problem now is reduced to an embedding problem where, given a set 
of distances between points, one asks what is the smallest manifold that con- 
tains the set. Given a graph G and an allowed distortion there are algorith- 
mic techniques to map the graph vertices to a normed space in such a way 
that distances between the vertices of G match the distances between their 
geometric images, up to the allowed distortion. However, these techniques 
are not directly applicable to our problem because in the distances between 
assets, computed from their return fluctuations, there are systematic and 
unsystematic contributions. Therefore, to extract factor information from 
the market, we have somehow to separate these two effects. The following 
stochastic geometry technique is used: 

(hi) From the matrix of distances, compute coordinates for the stocks 
in an Euclidean space of dimension N — 1. (For a degenerate matrix the 
embedding dimension may be smaller) 

(iv) The stocks are now represented by a set {xi} of points in R N ^, to 
which we assign masses {m,} equal to their market capitalizations. 

(v) To this cloud of weighted points we apply the standard analysis of 
reduction of their coordinates to the center of mass and computation of the 
eigenvectors of the inertial tensor. 

(vi) The same technique is also applied to surrogate data, namely to data 
obtained by independent time permutation for each stock and to random 
data with the same mean and covariance. 

(vii) The eigenvalues in (v) are compared with those of (vi). The direc- 
tions for which the eigenvalues are significantly different are now identified 
as the market systematic variables. 

Using weights (masses) proportional to the market capitalizations we are 
attempting to identify the empirically constructed variables that drive the 
market and the number of surviving eigenvalues is the effective dimension of 
this economic space. Of course, what such a procedure reconstructs is the 
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economic space associated to the set of stocks that is considered, not to the 
full market. Even if a very large set of financial assets is used, there is no 
implied claim that financial markets fully reflect all what we would like to 
know about macroeconomics. All one is trying to do here is to reconstruct 
an economic space, not the economic space. 

The same technique may be used to infer factors for portfolio hedging 
purposes. In this case there is no reason to include weights and all companies 
may be considered to have the same weight. We will have examples of both 
types of calculation. 

In a recent paper Gopikrishnan et al. || used similar techniques, although 
with a different perspective. Diagonalizing the correlation matrix (which is 
related to the metric we use) they have tried to identify particular eigen- 
vectors with the traditional industrial sectors. In our analysis the economic 
dimensions may or may not correspond to economic sectors or to other known 
economic factors or to any combination of them. It is up to the data to say 
what they are, independently of any previously established concepts. 

In Section 2 the method is explained in detail and then, as an example, 
it is applied to market data of a set of large companies that are or have been 
in the Dow Jones Industrial Average and S&P500 indexes. 

2 Reconstruction of an economic space 

2.1 The market metric 

From the returns r(k) for each security 



n being the number of components (number of time labels) in the vectors 
r (k). With this vector one defines the distance between the securities k and 



r t (k) = log(p t (k)) 



log(pt_i(fc)) 



(1) 



one defines a normalized vector 
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I by the Euclidean distance of the normalized vectors 
dki - 



]?(k)-7(i)\\ = p(i-c kl ) 

Cki being the correlation coefficient of the returns 

f*(k)~r(l)) - (j{k)) ("^(0) 



(3) 



a 



(4) 



Being an Euclidean distance between two vectors, Eq.(||) satisfies the usual 
distance axioms. It is the distance between market securities that was pro- 



posed in 10 and 11 



This distance is related to the covariances and much of what we discuss 
below could be carried out in a purely statistical setting. However the fact 
that dki is a properly defined distance gives a meaning to geometric notions 
and geometric tools in the study of the market. 

2.2 Characteristic dimensions, systematic covariance 
and factors 

After the distances are computed, for the set of N securities, they are imbed- 
ded in R N ~ l with coordinates (~x (k)\. The center of mass R is computed, 



R 



Ek m k 

the coordinates reduced to the center of mass, 



and the inertial tensor 



~y(k) = ~x(k) - R 



T i:j =J2 m kyi( k )yj( k ) 



(5) 



(6) 



(7) 



is diagonalized, the set of eigenvalues and normalized eigenvectors being 
|Aj, ~et\. The eigenvectors define the characteristic directions of the 
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weighed set of securities and their Zi(k) coordinates along these directions 
are obtained by projection 

Zi(k) = ~y{k) • ~et (8) 

As stated before, the most relevant characteristic directions for our pur- 
poses are those that correspond to the eigenvalues which are clearly different 
from those obtained from surrogate or random data. They define a subspace 
Vd of dimension d. This d— dimensional subspace carries the (systematic) 
information related to the market correlation structure. 

In portfolio optimization models of the mean-variance type, one usually 
distinguishes between the systematic and unsystematic (or specific) contri- 
butions to the portfolio risk. The former are associated to the correlations 
between the assets in the portfolio and the latter to the individual variances 
alone. Using our construction we find that part of the correlations contribu- 
tion is indistinguishable from random data. Hence the market (systematic) 
structure is carried by a smaller d— dimensional subspace. This suggests the 
definition of a market dimension d and a systematic covariance. 

Denote by ~z(k)( d > the restriction of the k— asset to the subspace Vd- and 
by d k f the distances restricted to this space. Then using Eqs.(^) and (f|) we 
may define a notion of systematic covariance affl 

4? = Vk\fc- kk - rlfiiy/au - rf (l - - (4?) 2 ) ( 9 ) 

where fi k = |^(A;) (d) |/l^(^)l > T k = (~^(&)) and a kk = (l* (k)l* (kf) . 
In a portfolio optimization problem 

r = 52W k r(k) (10) 
k 

the function to be minimized would be 

J2aifw k W l + Y,^ k W 2 k (11) 

identical to the classical Markowitz problem, but with the systematic covari- 
ance part restricted to the subspace Vd- 

This analysis also provides a rationale for the choice of the number of 
terms in the construction of factor models, the factors being constructed 
from the leading characteristic dimensions (see the example below). 
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2.3 Clustering 



In addition to a detailed subspace analysis of the economic space, existence 
of a market metric provides network topology coefficients to characterize the 
whole space. One such notion is clustering, a meaningful well-known notion 
in graph theory. Using the distance matrix d^ (Eq.(|3|)) to construct the 



minimal spanning tree connecting the N securities, as in Mantegna [III], we 
might then apply the graph theoretical notion of clustering to the spanning 
tree. However this construction neglects part of the information contained in 
the distance matrix. Instead we introduce a notion of continuous clustering 
as follows: 

dij being the distance between the securities i and j and d the average 
distance we define a function 

Vij = exp (-if) (12) 

which represents the neighbor degree of the securities i and j. A (continuous) 
clustering coefficient is then defined by 

° = N(N-l)(N-2) JJ fe VijVjkVik (13) 



3 An example 

We have considered the following 34 large companies which are, or have been, 
in the Dow Jones Industrial Average (DJIA) index: 

Alcoa (AA), Honeywell (HON), American Express (AXP), AT&T (T), 
Boeing (BA), Caterpillar (CAT), Chevron (CHV), Coca-Cola (KO), Dupont 
Nemours (DD), Eastman Kodak (EK), Exxon (XON), General Electric (GE), 
Goodyear (GT), IBM (IBM), International Paper (IP), McDonalds (MCD), 
Merck (MRK), Minnesota Mining (MMM), General Motors (GM), Philip 
Morris (MO), Procter & Gamble (PG), Sears (S), Texaco (TX), United 
Technologies (UTX), Citigroup (C), Hewlett-Packard (HWP), Home Depot 
(HD), Intel (INTC), J. P. Morgan Chase (JPM), Johnson & Johnson (JNJ), 
Microsoft (MSFT), SBC Communications (SBC), Wal-Mart (WMT), Walt 
Disney (DIS). 
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Actual data Time-permuted data 




5 10 15 20 25 30 5 10 15 20 25 30 

Random data Actual+(1 -Random) 




Figure 1: Eigenvalue distributions for the actual, time-permuted and random 
data 

They will be denoted by their tick symbols and we use daily data for the 
time period from September 1990 to August 2000. 

Using the whole data for the ten years, to define the vectors ~p(k) for 
each company, the calculations described in Section 2 have been performed 
for the actual returns data, for the time-permuted data and for random data 
with the same mean and variance as the actual data. In all cases we have 
performed the calculations with and without weights. The ordered eigen- 
value distributions that were obtained are shown in Fig.l. The conclusion is 
that the (systematic) market structure is contained in the first five dimen- 
sions. That is, these dimensions capture the structure of the deterministic 
correlations and economic trends that are driving the market, whereas the 
remainder of the market space may be considered as being generated by 
random fluctuations. For this market, these five dimensions define our em- 
pirically constructed economic variables. 

To have a qualitative idea concerning the structure of the characteristic 
dimensions, we have plotted in Figs. 2 and 3 the projections of the (weighted) 
stocks along the directions of the first eight eigenvectors. In the x-axis the 
companies are ordered according to their standard industrial code. Although 
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Projections on eigenvectors 1 to 4 
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Figure 2: Projection of the (weighted) stocks along the first four eigenvectors 

some companies in the same sector (for example the oil companies) have 
similar projections in the dominant eigenvalues, this is not at all true for all 
sectors, nor all companies. The association of companies working on different 
products in the same one or two-dimensional subspace is a confirmation of the 
fact that the search for the factors that drive the market cannot be identified 
with a definition of economic sectors. Notice that to be in the same market 
subspace, does not mean to be close to each other and some interesting 
anticorrelation effects are clear in Fig.2 and 3. This may be important to 
develop portfolio hedging strategies. 

To test the stability of the economic structure inferred from the market, 
we have divided the data in three chronologically successive batches and per- 
formed the same operations. The behavior of the eigenvalue distributions 
is very much the same. In Fig. 4 we have plotted the three dimensional 
subspaces associated to the three largest eigenvalues. Apart from statistical 
fluctuations, the reconstructed spaces show a reasonable degree of stability. 
However, similarity of the figures is only apparent with a permutation of 
the axis between the first and the second plot. The ordering of the largest 
eigenvalues changes in time although the overall distribution remains approx- 
imately the same. These ordering change may have an economic meaning 
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Projections on eigenvectors 5 to 8 
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Figure 3: Projection of the (weighted) stocks along the eigenvectors 5 to 8 

and be related to the relative importance and stability of groups of com- 
panies in different periods of expansion or recession. What is interesting, 
however, is the relative stability of the company positions and the size and 
distribution of the eigenvalues. It is as if the effective dimensionality of the 
space remained the same but with a pulsating effect on its shape. 

To test the dependence of the characteristic dimensions of the space on 
the number of companies we have added to our set, data of the same ten 
years period for 36 other large companies represented in the S&P500 index, 
namely (tick symbols only): 

ABT, MHP, MEL, NYT, NKE, OXY, PEP, PFE, PHA, CBE, ADBE, 
APA, ASH, AAPL, BAC, BK, BAX, BDK, CL, XRX, DCN, DAL, DG, SYY, 
F, G, HAL, EOG, HLT, RBK, SGP, SLB, UNP, UIS, WHR, GDW. 

Performing the same analysis as before for this larger set of 70 companies, 
we have found that the number of relevant eigenvectors grows from five to 
six. The small increase on the number of relevant characteristic dimensions 
for a set with double the size of the first one, and which covers a wider range 
of products, is quite remarkable. It seems to indicate that the systematic 
factors in the market are relatively few and furthermore, that they may be 
empirically defined. 
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Set. 90 - Dec. 93 



Jan. 94 - Apr. 97 




Figure 4: The leading 3-dimensional subspaces associated to three chrono- 
logically successive periods 

Finally we illustrate the computation of a set of empirical factors from 
the geometrical analysis of the first set of 34 companies. A factor model for 
the returns is 

5 

n = di + bkifk + £% (14) 

k=0 

where the aj are called the intercepts, b^i the factor loadings and £j the 
residual random terms. 

Recall that the first step in our analysis was the embedding of the 34 com- 
panies as a set of points in a 33-dimensional space. The company coordinates 
are then reduced to the center of mass (Eq.(|5])) and for the computation of 
the factors we consider equal masses m&. The vectors yi (t) denotes the time 
series reduced to the center of mass. 

Vi (t)=n (t)-r(t) 

The zero-factor fo is simply the average 

1 34 

/o(*)=r(t) = -£r,(t) (15) 
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When the 5 relevant directions are identified, one obtains a 5-dimensional 
subspace in a 33-dimensional space. The 5 factors are simply the 5 eigenvec- 
tors, associated to the largest eigenvectors, expressed in terms of the time 
series of the companies. They are obtained as follows: Let V be a matrix 
with columns being the (center of mass) coordinates of the normalized eigen- 
vectors and C a matrix containing as lines the (center of mass) coordinates 
of companies 2 to 34. Then 

M — CV 

is a matrix containing, as lines, the (center of mass) company coordinates 
projected on the eigenvectors. The factors, that is, the largest eigenvectors 
written in terms of the time series of the companies are 

fi (t) = £ M-V (2 : 34) (t) 

n 

where y n (2 : 34) denotes the center of mass coordinates of the companies 2 
to 34. 

Performing these operations on our data set, we have obtained vanishing 
dj intercepts (< 10~ 7 ) and factor loadings and variances of the residual 
random terms £j as listed below. These variances are of order 50% of the 
total variance of each company return. This might be considered too high a 
value for a satisfactory factor model. However it corresponds closely to the 
sum of the remaining 29 eigenvectors. These 29 eigenvalues are associated 
to dimensions which cannot be distinguished from those of random data. 
Therefore one concludes that no reliable improvement beyond the 5-factor 
model is possible with this data. 

In Fig. 5 we have plotted the contribution (Mj™ 1 ) of each time series to 
the factors. 
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Figure 5: The contribution of each time series to the factors 



4 Clustering and market shocks 

Synchronization in the market plays an important role in the occurrence of 
bubbles and crashes. Synchronization it at the root of the disproportionate 
impact of public events relative to their intrinsic information content. This 
applies to unanticipated public events but also to pre-scheduled news an- 
nouncements. Our clustering coefficient, as defined in section 2.3, is indeed a 
measure of synchronization in the market and as such may provide informa- 
tion independent from other market indicators. Not being constructed from 
a reduction to a minimum spanning tree, continuous clustering, as we have 
defined it, contains maximal information on market synchronization. As a 
first step towards a study of the role of this coefficient we have studied it for 
a subset of 25 companies, for which we had much longer time series available. 
We define volatility as the standard deviation of the returns and use centered 
time windows of 5 and 7 days. 

In Fig. 6 we compare clustering (C) and volatility (a) for the period 
September 1980 - August 2000 with a time window (w) of 5 days. One no- 
tices that most (not all) volatility peaks also correspond to clustering peaks. 
However, there are many periods of high clustering which are not associated 
to very large volatility. This effect is statistically robust, in the sense that 
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Figure 6: Clustering and volatility for the period September 1980 - August 
2000 with a time window of 5 days 

it remains for much larger time windows. In most cases where there are 
simultaneous volatility and clustering peaks, clustering decays faster than 
volatility. Although volatility remains high, synchronization fades out faster 
after the initial shock. There are exceptions, though (see below). 

In Figs. 7, 8 and 9 we have expanded the periods September 1987 - 
January 1988, August 1990 - October 1990 and September 1997 - December 
1997 using time windows of 5 and 7 days. In Fig. 7 one sees that around 
October 19, 1987 (Black Monday) there are both clustering and volatility 
peaks, but that clustering (synchronization) decays faster than volatility. In 
addition there is around January 6, 1988 another clustering peak that is not 
accompanied by exceptionally high volatility. Another interesting example 
is provided by Fig. 8 where one sees a clustering peak at around August 
15, 1990 with small volatility and a volatility peak after September 5, 1990 
without increase in the clustering. Finally, Fig. 9 shows that around October 
27, 1997 (2nd Black Monday - Asian crisis) clustering and volatility have very 
similar behavior. 

The main conclusion is that clustering indeed provides some new informa- 
tion on the market which is independent from the one provided by volatility. 
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Figure 7: Clustering and volatility for the period September 1987 - January 
1988 with time windows of 5 and 7 days 

Together they provide insight on the different types of market shocks. 

5 Conclusions 

(i) The main result from our empirical study of the market geometric struc- 
ture is the dimension reduction that is observed, when compared with the 
number of companies of different sectors that are analyzed. This may have 
useful implications for economic modelling and the identification of subspaces 
and characteristic dimensions may provide a rationale for the search for eco- 
nomic factors which are neither sectors nor other obvious economic facts. 

(ii) Underlying all modern views of asset pricing and portfolio selection 
is the idea that unsystematic risk may be eliminated by diversification. A 
large diversification (comparable to the whole market) involves large costs 
and efficient managing. It would be much simpler to have a small number of 
partially anticorrelated stocks. In addition to providing a rationale for the 
choice of the number of terms in factor models, our approach also suggests 
what might be called a dimension-by-dimension (DBD) hedging strategy, 
where diversification is not achieved by mimicking the market portfolio, but 
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Figure 8: Clustering and volatility for the period August 1990 - October 1990 
with time windows of 5 and 7 days 



by balancing the stocks in appropriate amounts in a few dimensions. 

(iii) In our example (but not necessarily in the method) we have con- 
centrated on stocks. Nowadays there are on the market a myriad of other 
more or less risky assets. In principle the same method also applies to other 
financial instruments and it may turn out that the nature of the economic 
spaces reconstructed from different asset types will give us different views on 
the over-all economic space. 

(iv) At a more ambitious level one might think that, once the dimensions 
of the economic space are identified, a framework is available to establish 
dynamical equations for the market process. However, one should remember 
that the bulk of the market fluctuation process seems to be a short-memory 
process with a very small long- memory component [12] , which is nevertheless 
very important for practical purposes, because it is associated with the large 
fluctuations of the returns. Therefore separation of the components and re- 
construction of their characteristic spaces might be an essential precondition 
for establishing any meaningful market dynamics description. 

(v) There is a great deal of controversy over experimental tests of the 
market efficiency hypothesis in its weak, semistrong and strong versions. 
At the theoretical level the modern view of the hypothesis states [O] that 
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Figure 9: Clustering and volatility for the period September 1997 - December 
1997 with time windows of 5 and 7 days 



market overreaction in some circumstances and underreaction in others is a 
pure chance event. In other words, the expected value of abnormal returns 
is zero. Other views state that a behavioral component |14| must always be 
included in any description of the market. Behavioral trends, however, may 
not be inconsistent with a pure statistical description if the different reaction 
times and secondary reactions are taken into account fll5 |. 

Our results do suggest the existence of a certain amount of structure in 
the market. However it is a result neither in favor nor against the market 
efficiency hypothesis because even if, by careful consideration of the market 
structure along the lines we propose in this paper, dimensions and the ambi- 
ent manifold become well defined, no conclusion can be drawn on the nature 
of the stochastic process that is taking place there. 

(vi) Finally, an important spillover from our metric discussion of the 
market structure is the notion of continuous clustering which may provide 
useful insight on synchronization and market shocks. 
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